Blind separation of noisy Gaussian stationary sources. 
Application to cosmic microwave background imaging. 
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ABSTRACT 

We present a new source separation method which max- 
imizes the hkehhood of a model of noisy mixtures of 
stationary, possibly Gaussian, independent components. 
The method has been devised to address an astronom- 
ical imaging problem. It works in the spectral domain 
where, thanks to two simple approximations, the like- 
lihood assumes a simple form which is easy to handle 
(low dimensional sufficient statistics) and to maximize 
(via the EM algorithm). 

1 SOURCE SEPARATION for ASTRONOMY 

1.1 Astronomical components 

Source separation consists in recovering components 
from a set of observed mixtures. Component separa- 
tion is a topic of major interest to the Planck space 
mission, to be launched in 2007 by ESA to map the 
cosmic microwave background (CMB). The blackbody 
temperature of this radiation as a function of direc- 
tion on the sky will be measured in m = 10 differ- 
ent frequency channels, corresponding to wavelengths 
ranging from A = 350 microns to A = 1 cm. In each 
channel, the temperature map will show not only the 
CMB contribution but also contributions from other 
sources called foregrounds, among which Galactic dust 
emission, emission from very remote (and hence quasi 
point-like) galaxy clusters, and several others. It is ex- 
pected that (after some heavy pre-processing), the map 
built from the i-th channel can be accurately modeled as 
Vii'^) = Sj=i ^ij^ji^) + ^i(^) where Sj{r) is the spatial 
pattern for the j-th component and ni{r) is an addi- 
tive detector noise. In other words, cosmologists expect 
to observe a noisy instantaneous {i.e. non convolutive) 
mixture of essentially independent components (inde- 
pendence being the consequence of the physically dis- 
tinct origins of the various components). Even though 
recovering as cleanly as possible the CMB component 
is the primary goal of the mission, astrophysicists are 
also interested in the other components, in particular 
for collecting data regarding the morphology and physi- 
cal properties of Galactic foregrounds (dust. . . ) and the 
distribution of galaxy clusters. 



This paper deals with blind component separation. 
Blindness means recovering the components without 
knowing in advance the coefficients of the mixture. In 
practice, this may be achieved by resorting to the strong 
but often plausible assumption of mutual statistical in- 
dependence between the components. The motivation 
for a blind approach is obvious: even though some 
coefficients of the mixture may be known in advance 
with good accuracy (in particular those related to the 
CMB), some other components are less well known or 
predictable. It is thus very tempting to run blind algo- 
rithms which do not require a priori information about 
the mixture coefficients. 

1.2 Blind separation methods 

Several attempts at blind component separation for 
CMB imaging have already been reported. The first 
proposal, due to Baccigalupi et al. use a non Gaussian 
noise-free i.i.d. model for the components [1], hence fol- 
lowing the 'standard' path to source separation. One 
problem with this approach is that the most important 
component, namely the CMB itself, seems to closely fol- 
low a Gaussian distribution. It is well known that, in 
i.i.d. models, it is possible to accommodate at most one 
Gaussian component. It does not seem to be a good 
idea, however, to use a non Gaussian model when the 
main component itself has a Gaussian distribution. 

Another reason why the i.i.d. modeling (which is im- 
plicit in 'standard' ICA) probably is not appropriate to 
our application: most of the components are very much 
dominated by the low-frequency part of their spectra. 
Thus sample averages taken through the data set tend 
not to converge very quickly to their expected values. 
This may explain why Fourier methods, presented be- 
low, seem to perform better. 

Thus, rather than exploiting the non Gaussianity of 
(all but one of) the components, one may think of ex- 
ploiting their spectral diversity. A very sensible ap- 
proach is proposed by Pham: using the Whittle approx- 
imation of the likelihood, he shows that blind separa- 
tion can be achieved by jointly diagonalizing spectral 
covariance matrices computed over several frequency 
bands [3]. This conclusion however is reached only in 



the case of noise- free models. Therefore, it is not appro- 
priate for CMB imaging where a very significant amount 
of noise is expected. 

In this paper, we follow Pham's approach but we take 
additive noise into account, leading to a likelihood func- 
tion which is no longer a joint diagonality criterion, thus 
requiring some new algorithmics. We present below the 
form taken by the EM algorithm when applied to a set 
of spectral covariance matrices. This approach leads to 
an efficient algorithm, much faster than the algorithms 
obtained via the EM technique in the case of non Gaus- 
sian i.i.d. modeling as in [2] or [4]. 

1.3 A stationary Gaussian model 

Our method is obtained by starting from a stationary 
Gaussian model. For ease of exposition, we assume that 
the observations are m times series rather than m images 
(extension to images is straightforward). The m x 1- 
dimensional observed process y{t) = [yi{t);. . ym{t)] is 
modeled as 

y{t)=As{t)^n{t) (1) 

where ^4 is an m x n matrix with linearly independent 
columns. The n-dimensional source process s{t) (the 
components) and the m-dimensional noise process n{t) 
are modeled as real valued, mutually independent and 
stationary with spectra 5's(i/) and Sn{T^) respectively. 
The spectrum of the observed process then is 

Sy{iy)=ASs{iy)A'^ ^Sn{iy). (2) 

The t superscript denotes transconjugation even though 
transposition would be enough almost everywhere in 
this paper (our method is easily adapted to deal with 
complex signals/mixtures). The assumption of indepen- 
dence between components implies that Ss{i^) is a diag- 
onal matrix: 

where Pi(i^) is the power spectrum of the ith source 
at frequency u. For simplicity, we also assume that the 
observation noise is uncorrelated both in time and across 
sensors: 

[Sn{l^)]iJ = (3) 

meaning that the noise spectral density af on the ith 
detector does not depend on frequency z/. In summary 
the probability distribution of the process is fully speci- 
fied by m X n mixture coefficients, m noise levels and n 
power spectra. 

2 THE OBJECTIVE FUNCTION 

Our method boils down to adjusting smoothed versions 
of the spectral covariance matrices (2) to their empirical 
estimates. The estimated parameters are those which 
give the best match, as measured by an objective func- 
tion. This objective function is introduced in this sec- 
tion. In the following section, we show how it stems 
from the maximum likelihood principle. 



2.1 Spectral averaging. 

A key feature of our method is that it uses low dimen- 
sional statistics obtained as averages over spectral do- 
mains in Fourier space. These Fourier domains sim- 
ply are frequency bands in the ID case or are two- 
dimensional domains of the Fourier plane when the 
method is applied to images. 

Consider a partition of the frequency interval (— ^, ^) 
into Q domains (bands): (— ^, |) = U^^-^Vq which are 
required to be symmetric: f E Vq ^ —f G Vq. For any 
function /(i/) of frequency, denote {f)q its average over 
the q-th spectral domain when sampled at multiples of 
1/T: 

(/). = ^E/(|)- ^ = i>->Q (4) 

where Wq is the number of points in domain Vq. 

2.2 Spectral statistics 

Denoting Y(iy) the discrete-time Fourier transform of T 
samples: 

1 ^"^ 

^(^) = ^ E exp(-2i^z/t), (5) 
t=o 

the periodogram is Sy{i') =Y{iy)Y{i'y and its averaged 
version is 

(Sy), = {Y{,y)Y{u)% (6) 

Note that Y{-u) = F(i/)* for real data so that {Sy)q 
actually is a real valued matrix if Vq is a symmetric 
domain. 

This sample spectral covariance matrix will be our 
estimate for the corresponding averaged quantity 

(Sy), = A{S.),A^ + {Sn), (7) 

where the equality results from averaging model (2). 

A key point is that the structure of the model is not 
affected by spectral averaging since {Ss)q remains a di- 
agonal matrix after averaging: 

{Ss),=dmg[{P,)q,...,{Pn)q] 

and, of course, {Sn)q = Sn = diag((7i, . . . , cr^) still is a 
constant diagonal matrix. 

2.3 Blind identification via spectral matching 

Our proposal for blind identification simply is to match 
the sample spectral covariance matrices {Sy)q^ which 
depend on the data, to their theoretical values {Sy)q, 
which depend on the unknown parameters. There are 
mxn + Qxn + m of these parameters, ^ collectively 
referred to as 6>: 

^ There are actually n redundant parameters since a scale factor 
can be exchanged between each column of A and the correspond- 
ing power spectra. 



The mismatch between the sample statistics and their 
expected values is quantified by the average divergence: 

Q 

m=T.'^,D{{Sy),ASy),) (9) 
q=l 

where the positive weight Wq is (as above) proportional 
to the size of the q-th spectral domain and where D{-,-) 
is a measure of divergence between two m x m positive 
matrices defined as 

D{Ri,R2) =tT[RiR-^) -\ogdet{RiR^^) -m (10) 

This is nothing but the Kullback divergence between two 
n-dimensional zero-mean Gaussian distributions with 
positive covariance matrices Ri and R2 respectively. 

The reason for using the mismatch measure (9) is its 
connection to maximum likelihood principle (see below). 
Even though the divergence (9) may, at first sight, seem 
more difficult to deal with than a simple quadratic dis- 
tance, it is actually a better choice in at least two re- 
spects: first, we expect it to yield efficient parameters 
estimates because of the asymptotic optimality of max- 
imum likelihood estimation; second, thanks to its con- 
nection to the log-likelihood, it lends itself to simple 
optimization via the EM algorithm (see below). 

A last note: since domain averaging does not change 
the algebraic structure of the spectral covariance matri- 
ces {i.e. eq. (2) becomes (7)), it does not introduce any 
bias in the estimation of A. 

3 MAXIMUM LIKELIHOOD AND EM 

3.1 Whittle approximation 

The Whittle approximation is a spectral approximation 
to the likelihood of stationary processes. It has been in- 
troduced for the blind separation of noise-free mixtures 
by Pham [3]. Simplifying a little bit, this approximation 
boils down to asserting that the coefficients Y{v) of defi- 
nition (5) taken at frequencies v = p/T are uncorrelated, 
have zero-mean and a covariance matrix equal to Sy{iy). 
Simple computations then show that — up to a constant 
and a scalar factor — the (negative) log- likelihood of the 
data takes the form (9) under the additional approxima- 
tion that Sy{u) is constant over each spectral domain. 

The Whittle approximation is good for Gaussian pro- 
cesses but certainly does not capture all the probability 
structure, even for large T, for non Gaussian processes. 
However, it it still provides a principled way of exploit- 
ing the spectral structure of the process, leading to the 
selecting of (9) as an objective function. In addition, it 
suggests to use the EM algorithm for minimizing (9). 

3.2 An EM algorithm in the spectral domain 

Using the EM technique for maximizing a likelihood 
function requires defining latent (unobserved) data. In 
the case of source separation, there is an obvious choice: 
take the components as the latent data. This approach 



was introduced in [2] for a noisy non Gaussian i.i.d. 
model of source separation and later in [4] for a noisy 
non Gaussian model in the spectral domain. Both these 
models lead to heavy computation. In contrast, by i) 
using only a Gaussian model (the Whittle approxima- 
tion) and ii) averaging over spectral domains, the EM 
algorithm becomes very computationally attractive. 

Room is lacking for a complete derivation of the EM 
algorithm in our case but it is not difficult to adapt, 
for instance, the computations of [2] to our case. Let 
us only mention why the resulting algorithm is much 
simpler. 

First, when dealing with data structured diS y = As -\- 
n, EM needs to evaluate conditional expectations E{s\y) 
and E(ss^y). Thanks to the Gaussian model, these 
are readily found to be linear functions of y and yy^ 
respectively: 

E{s\y) = Wy (11) 
E{ss^y) = Wyy'^W'^^C (12) 

where matrices C and W are defined as 

C = {A^R-'A + Rj'r' (13) 
W = (^t^-i^ + ij-i)-i^t^-i (14) 

with covariance matrices Rs = Cov(5), Rn = Cov(n). 

Second, this linearity is preserved through domain av- 
eraging, meaning that the EM algorithm only needs to 
operate on the sample covariance matrices {Sy)q. This 
set of matrices is a sufficient statistic set in our model; 
it is also all that is needed to run the EM algorithm. 

Thus, blind separation of noisy mixtures of stationary 
sources can be achieved by computing the periodogram, 
averaging it into a set of sample covariance matrices and 
maximizing the likelihood by then running the EM al- 
gorithm. The algorithm is summarized in pseudo-code 
(see Alg. 1), but its derivation (which is purely mechan- 
ical) is omitted. In this pseudo-code, the diag(-) oper- 
ator sets to the off-diagonal elements of its argument. 
We also include a renormalization step which deals with 
the scale indetermination inherent to source separation: 
each column of A is normalized to have unit norm and 
the corresponding scale factor is applied to the average 
source spectra. 

4 APPLICATION AND COMMENTS 

4.1 Separating astrophysical components 

Preliminary tests have been carried on simulated ob- 
servations in six channels at microwave frequencies 100, 
143, 217, 353, 545 and 857 GHz, over sky patches of size 
12.5° X 12.5° sampled over a 300 x 300 pixel grid. Mix- 
tures include three astrophysical components (CMB, 
Galactic dust, and emission from galaxy clusters) and 
white noise. 

Since isotropic observations are expected, we choose 
spectral domains which are not only symmetric but also 



Start with sample covariance matrices {Sy)q^ and 

initial guesses for A, Sn and {Ss)q' 

repeat 

{E-step Compute conditional statistics} 

for = 1 to Q do 



Rvs{q) 



-Ca 



ysK^J — {Sy)qSn ^ 

RU<l) = C,A^S-\Sy),S-^AC, 
end for 

Rss = WqRss{q) 

Update the parameters} 



^yy — T 
{M-step 



(?=i 



^qRysiq) 

WqRyy{q) 



Rys^ss Rys) 



A — RysRgg 

Sn = diag {Ryy 

{Ss)q = dmg{Rss{q)) for 1 < ^ < Q. 
Renormalize A and the {Ss)q 
until convergence 



Alg. 1: Gaussian EM algorithm over spectral domains 

rotation invariant; in other words: spectral rings. The 
sample spectral covariance are computed over 15 such 
rings equally spaced over the whole band. Data reduc- 
tion thus is by a factor of (6 x 300 x 300)/(15 x 6 x 6) = 
1000. The EM algorithm converges in a few tens of 
iterations amounting to a few seconds on a 1 GHZ ma- 
chine when coded in octave (a free clone of Matlab: 
http : // www . oct ave . org) . 

Room is lacking for a detailed description of our ex- 
periments which will be reported elsewhere. See fig- 
ure 4.1 for an illustration with a typical (and significant) 
level of noise. A notable feature here is the separation of 
the galaxy clusters. The SNR on the CMB component is 
also much improved at high frequency even though this 
cannot be assessed from the picture. See the companion 
paper in these proceedings for more details. 

4.2 Conclusion 

We have proposed an efficient method to maximize the 
likelihood of a model of noisy mixtures of stationary 
sources by implementing the EM algorithm on spectral 
domains. The procedure jointly estimates all the pa- 
rameters: mixing matrix, average source spectra, noise 
level in each sensor. Spectral averaging offers large com- 
putational savings, especially when dealing with images. 

Since the inference principle is maximum likelihood 
for a 'smooth' Gaussian stationary model, we expect a 
good statistical efficiency when the source spectra are 
reasonably smooth (even though we saw little perfor- 
mance degradation in our experiments when using a 
very coarse Q = 2 spectral partition) and when the 
sources actually are Gaussian. In the CMB application, 
some components are very close to Gaussian (the CMB 
itself) while others are strongly non Gaussian; it is not 




Figure 1: Top two rows: maps at the detectors. Bottom 
row: components extracted with the Wiener filter based 
on the estimated parameters. 



clear yet how to best combine non Gaussian information 
with spectral diversity. 

As final note, we recall that, in the noise-free case, the 
ability to blindly separate Gaussian stationary compo- 
nents rests on spectral diversity: the spectra of any two 
sources should not be proportional. The noisy case is 
complicated by the fact that the noise parameters also 
have to be estimated. 

Future research should cover many issues: blind iden- 
tifiability in unknown noise, choice of the spectral do- 
mains, integration of non Gaussian information, inte- 
gration of prior information,. . . 
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